Principled Parsing for Indentation-Sensitive Languages
نویسنده
چکیده
Many languages, such as Haskell, Python, and F#, use the indentation and layout of code as part of their syntax. Because context-free grammars are not able to express these layout rules, existing parsers use ad hoc techniques to handle them. These techniques tend to be low-level and operational in nature, and thus forgo the advantages of more declarative specifications like context-free grammars. For example, they are often coded by hand instead of being generated by a parser generator. This paper presents a simple extension to context-free grammars for expressing these layout rules and derives CYK, GLR, and LR(k) algorithms for parsing these languages. These grammars are easy to write and can be parsed efficiently. Example for several languages are presented, as are benchmarks showing the practical efficiency of these algorithms.
منابع مشابه
Layout-Sensitive Generalized Parsing
The theory of context-free languages is well-understood and context-free parsers can be used as off-the-shelf tools in practice. In particular, to use a context-free parser framework, a user does not need to understand its internals but can specify a language declaratively as a grammar. However, many languages in practice are not context-free. One particularly important class of such languages ...
متن کاملتأثیر ساختواژهها در تجزیه وابستگی زبان فارسی
Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...
متن کاملNatural and Flexible Error Recovery for Generated Parsers
Parser generators are an indispensable tool for rapid language development. However, they often fall short of the finesse of a hand-crafted parser, built with the language semantics in mind. One area where generated parsers have provided unsatisfactory results is that of error recovery. Good error recovery is both natural, giving recovery suggestions in line with the intention of the programmer...
متن کاملOccam's Razor: the Cutting Edge for Parser Technology
Yacc is well established in the compiler-compiler eld, but is beginning to show its age. Issues which were important when hardware resources were more scarce are now less critical. Precc is a new compiler-compiler tool that is much more versatile than yacc, whilst retaining eeciency of operation on modern computers. It copes with the context-dependent BNF grammar descriptions and higher order m...
متن کاملA declarative extension of parsing expression grammars for recognizing most programming languages
Parsing Expression Grammars are a popular foundation for describing syntax. Unfortunately, several syntax of programming languages are still hard to recognize with pure PEGs. Notorious cases appears: typedef-defined names in C/C++, indentation-based code layout in Python, and HERE document in many scripting languages. To recognize such PEG-hard syntax, we have addressed a declarative extension ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012